A new VAD framework using statistical model and human knowledge based empirical rule
نویسندگان
چکیده
This paper presents a new voice activity detection (VAD) framework that is based on the empirical rules and statistical models. First, the VAD framework detects the candidate endpoints efficiently in the time domain with empirical rules which are based on the human knowledge and the nature of the speech continuousness, and then it confirms the candidate endpoints in the transform domain with different confirmation schemes for beginning-point and ending-point. Particularly in the transform domain, a new algorithm called sliding-window double-layer confirmation (SWDC) is proposed and employed to confirm the endpoint accurately, and sensitive data, which is used for GMM training, are proposed for our detection scheme. The experiments show that the proposed VAD framework achieves better performances in various environmental conditions.
منابع مشابه
An efficient voice activity detection algorithm by combining statistical model and energy detection
In this article, we present a new voice activity detection (VAD) algorithm that is based on statistical models and empirical rule-based energy detection algorithm. Specifically, it needs two steps to separate speech segments from background noise. For the first step, the VAD detects possible speech endpoints efficiently using the empirical rulebased energy detection algorithm. However, the poss...
متن کاملA New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملOptimal Policy Rules for Iran in a DSGE Framework (Islamic Musharakah Approach)
The aim of this paper is determination of an optimal policy rule for Iranian economy from an Islamic perspective. This study draws on an Islamic instrument known as the Musharakah contract to design a dynamic stochastic general equilibrium model. In this model the interest rate is no longer considered as a monetary policy instrument and the focus is on the impact of economic shocks on the Dynam...
متن کاملEndpoint detection using weighted finite state transducer
In this paper, we discuss the possibility of applying weighted finite state transducer (WFST) as a unified framework to solve endpoint detection problem. In general, endpoint detection is composed of two cascaded decision processes. The first process is voice activity detection (VAD) which makes framelevel speech/non-speech classification. The second process is utterance-level detection which m...
متن کاملVoice activity detection based on statistical models and machine learning approaches
The voice activity detectors (VADs) based on statistical models have shown impressive performances especially when fairly precise statistical models are employed. Moreover, the accuracy of the VAD utilizing statistical models can be significantly improved when machine-learning techniques are adopted to provide prior knowledge for speech characteristics. In the first part of this paper, we intro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010